Effect of For and Against Goals

Analysis

What predictive power do the draw-groups have in combination with for and against goals?
Author

Samuel

Published

August 25, 2025

TLDR

  1. Average goals for and Average goals against per league position improves ranking groups predictions

Digging deeper with regressions

In the previous article, the main finding was that:

  • there doesn’t seem to be an obvious relationship between position and draws

Finding little is still a type of finding. At least it confirms that it is hard to predict for draws. This does mean I will here bring in some more abstract ways to drill below the surface.


Call:
lm(formula = Draws ~ Draws_Category, data = snapshots_all_seasons_34_regr)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.8542 -1.7847 -0.1389  1.2153  8.2153 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 8.8542     0.1965  45.058  < 2e-16 ***
Draws_Categoryhist_low     -2.0694     0.2779  -7.447  5.3e-13 ***
Draws_Categoryhist_medium  -0.7153     0.2779  -2.574   0.0104 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.358 on 429 degrees of freedom
Multiple R-squared:  0.1177,    Adjusted R-squared:  0.1136 
F-statistic: 28.61 on 2 and 429 DF,  p-value: 2.176e-12

To drill down, I’ve done a regression above. I’m testing if the 3 categories (high, middle and low) have predictive power on the number of draws in the final round. So, what is the size of the effect of the 3 groups? And is the effect significant?

  • Each category has significant effects (not due to chance)

    • High group: positive effect on final number of draws

    • Medium group: small negative effect on final number of draws

    • Low group: negative effect on final number of draws

  • R-squared = only 11% of variance explained by these cats

    • still a decent result: worth taking into account when predicting

Random intercept model

I want to test these 3 groups a bit more using a random intercept model. Such modeling lets me test the groups for different starting points (intercepts) due to being grouped. So how do the 3 categories model, if I consider them as random effects?

Loading required package: Matrix

Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':

    expand, pack, unpack
Linear mixed model fit by REML ['lmerMod']
Formula: Draws ~ (1 | Draws_Category)
   Data: snapshots_all_seasons_34_regr

REML criterion at convergence: 1975.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8941 -0.7725 -0.0557  0.4997  3.4670 

Random effects:
 Groups         Name        Variance Std.Dev.
 Draws_Category (Intercept) 1.066    1.032   
 Residual                   5.561    2.358   
Number of obs: 432, groups:  Draws_Category, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)   7.9259     0.6068   13.06

The following shows from using a random intercept for the 3 groups:

  • Grand mean = 7.93 draws

  • Differences between categories exist, but most variability is within categories

  • Categories explain about 16% of the variance in this model

This confirms that there’s some predictive truth to these categories. It does however require a model of a world where there are only categories of ranking and draws. This is far-removed from all the things that influence matches turning into a draw or not. There are many variables that are being left out of the model.

Average number of goals for and against

To take this a bit further, I want to see how these random effects play together with other variables. After all, draws don’t happen in a vacuum: they are the result of teams not creating a goal difference during the matches they play. So I want goals to enter the equation. Specifically, I will enter the average number of goals for and the average number of goals against (per league rank, per final round, per season) into the equation. My thinking is that the goal averages are indicators of a team’s ability to create or resist goal differences. I will combine these variables with the random effect of the 3 groups.

Linear mixed model fit by REML ['lmerMod']
Formula: Draws ~ Avg_Goals_For + Avg_Goals_Against + (1 | Draws_Category)
   Data: snapshots_all_seasons_34_regr

REML criterion at convergence: 1943.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9825 -0.6549 -0.0702  0.5909  4.1675 

Random effects:
 Groups         Name        Variance Std.Dev.
 Draws_Category (Intercept) 0.7058   0.8401  
 Residual                   5.1840   2.2768  
Number of obs: 432, groups:  Draws_Category, 3

Fixed effects:
                  Estimate Std. Error t value
(Intercept)        12.9752     1.0115  12.827
Avg_Goals_For      -1.6630     0.2960  -5.619
Avg_Goals_Against  -1.6446     0.3415  -4.816

Correlation of Fixed Effects:
            (Intr) Av_G_F
Avg_Gols_Fr -0.775       
Avg_Gls_Agn -0.800  0.637
`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

Great stuff. This model in short:

  • Both scoring more and conceding more reduces draws

  • Categories (Draws_Category) still matter, but their effect is small compared to scoring/conceding goals

  • The model fits better than a random-intercept-only model (REML criterion is lower than 1975.4 → 1943.9).

In football terms, this model confirms that teams with low scoring/conceding goals tend to draw more matches.

Conclusions

  1. Who does all the drawing?
    • teams with low goals for or against
    • positions 6, 7, 9, 10, 11, 12
  2. Is there a relationship between league ranking and drawing?
    • there is, but it has limited predictive power
    • significant (so no coincidence) but not a strong effect
  3. Limitations:
    • this analysis focused very strongly on the final round
    • opted for a rather myopic approach to focus on league position
    • this was a starting point to test and experiment and therefore lacks rigor across the board